Are All Commas Equal? Detecting Coordination in the Penn Treebank

نویسندگان

  • Wolfgang Maier
  • Sandra Kübler
چکیده

Coordination has always been a difficult phenomenon, with regard to linguistic analysis, manual annotation, and automatic analysis. There is a considerable body of work on detecting coordination and on improving parsing for this phenomenon. However, most approaches were restricted to certain types of coordination, such as NP coordination or symmetrical coordination. We present the first approach to classifying punctuation signs into whether they function as separators between conjuncts in coordination or not.We show that by using information from a parser in combination with context information, we reach an F-score of 89.22 on positive cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese sentence segmentation as comma classification

We describe a method for disambiguating Chinese commas that is central to Chinese sentence segmentation. Chinese sentence segmentation is viewed as the detection of loosely coordinated clauses separated by commas. Trained and tested on data derived from the Chinese Treebank, our model achieves a classification accuracy of close to 90% overall, which translates to an F1 score of 70% for detectin...

متن کامل

Modeling Comma Placement in Chinese Text for Better Readability using Linguistic Features and Gaze Information

Comma placements in Chinese text are relatively arbitrary although there are some syntactic guidelines for them. In this research, we attempt to improve the readability of text by optimizing comma placements through integration of linguistic features of text and gaze features of readers. We design a comma predictor for general Chinese text based on conditional random field models with linguisti...

متن کامل

Coordination Annotation Extension in the Penn Tree Bank

Coordination is an important and common syntactic construction which is not handled well by state of the art parsers. Coordinations in the Penn Treebank are missing internal structure in many cases, do not include explicit marking of the conjuncts and contain various errors and inconsistencies. In this work, we initiated manual annotation process for solving these issues. We identify the differ...

متن کامل

Annotating Coordination in the Penn Treebank

Finding coordinations provides useful information for many NLP endeavors. However, the task has not received much attention in the literature. A major reason for that is that the annotation of major treebanks does not reliably annotate coordination. This makes it virtually impossible to detect coordinations in which two conjuncts are separated by punctuation rather than by a coordinating conjun...

متن کامل

Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

Resolving coordination ambiguity is a classic hard problem. This paper looks at coordination disambiguation in complex noun phrases (NPs). Parsers trained on the Penn Treebank are reporting impressive numbers these days, but they don’t do very well on this problem (79%). We explore systems trained using three types of corpora: (1) annotated (e.g. the Penn Treebank), (2) bitexts (e.g. Europarl),...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013